Identifying Patterns for Unsupervised Grammar Induction
نویسندگان
چکیده
This paper describes a new method for unsupervised grammar induction based on the automatic extraction of certain patterns in the texts. Our starting hypothesis is that there exist some classes of words that function as separators, marking the beginning or the end of new constituents. Among these separators we distinguish those which trigger new levels in the parse tree. If we are able to detect these separators we can follow a very simple procedure to identify the constituents of a sentence by taking the classes of words between separators. This paper is devoted to describe the process that we have followed to automatically identify the set of separators from a corpus only annotated with Part-of-Speech (POS) tags. The proposed approach has allowed us to improve the results of previous proposals when parsing sentences from the Wall Street Journal corpus.
منابع مشابه
Neural networks for learning grammars
The straightforward mapping of a grammar onto a connectionist architecture is to make each grammar symbol correspond to a node and each rule correspond to a pattern of connections. The grammar then expresses the c̀ompetence' of the network. The (unsupervised) grammatical inference problem is therefore: how can a network learn to configure itself to reflect the syntactic structure in its input pa...
متن کاملUnsupervised language acquisition: syntax from plain corpus
We describe results of a novel algorithm for grammar induction from a large corpus. The ADIOS (Automatic DIstillation of Structure) algorithm searches for significant patterns, chosen according to context dependent statistical criteria, and builds a hierarchy of such patterns according to a set of rules leading to structured generalization. The corpus is thus generalized into a context free gra...
متن کاملThe Shared Logistic Normal Distribution for Grammar Induction
We present a shared logistic normal distribution as a Bayesian prior over probabilistic grammar weights. This approach generalizes the similar use of logistic normal distributions [3], enabling soft parameter tying during inference across different multinomials comprising the probabilistic grammar. We show that this model outperforms previous approaches on an unsupervised dependency grammar ind...
متن کاملBilingually-Guided Monolingual Dependency Grammar Induction
This paper describes a novel strategy for automatic induction of a monolingual dependency grammar under the guidance of bilingually-projected dependency. By moderately leveraging the dependency information projected from the parsed counterpart language, and simultaneously mining the underlying syntactic structure of the language considered, it effectively integrates the advantages of bilingual ...
متن کاملConcavity and Initialization for Unsupervised Dependency Grammar Induction
We examine models for unsupervised learning with concave log-likelihood functions. We begin with the most well-known example, IBM Model 1 for word alignment (Brown et al., 1993), and study its properties, discussing why other models for unsupervised learning are so seldom concave. We then present concave models for dependency grammar induction and validate them experimentally. Despite their sim...
متن کامل